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We consider a chemical reaction network governed by mass action kinetics and composed of N 
different species which can reversibly form heterodimers. A fast iterative algorithm is introduced to 
compute the equilibrium concentrations of such networks. We show that the convergence is guar- 
anteed by the Banach fixed point theorem. As a practical example, of relevance for a quantitative 
analysis of microarray data, we consider a reaction network formed by N ~ 10 6 mutually hybridiz- 
ing different mRNA sequences. We show that, despite the large number of species involved, the 
convergence to equilibrium is very rapid for most species. The origin of slow convergence for some 
specific subnetworks is discussed. This provides some insights for improving the performance of the 
algorithm. 



I. INTRODUCTION 

Systems of coupled chemical reactions involving many 
different species, i.e. reaction networks, have been inten- 
sively studied in chemistry, physics, mathematics, and 
engineering sciences (see e.g. [ll-flOl]). If diffusion is fast 
enough and the number of molecules is sufficiently large, 
so that stochastic effects can be neglected, these systems 
can be described by a set of coupled first order ordinary 
differential equations (ODE), which govern the time evo- 
lution of the concentrations of each species. In the ODE 
description, the rates of production and consumption of 
the chemical species are given in term of mass action, 
Michaelis-Menten, or other cooperative-type kinetics Q ■ 
In such systems different types of behavior are possible, 
as for instance relaxation to a unique stationary point, 
oscillations or multistability. 

Usually the time evolution of the system can be com- 
puted through numerical integration of the ODE. How- 
ever, this method can become very slow for large reaction 
networks. In addition, the main interest is typically the 
long time behavior of the system, which in absence of 
oscillations boils down to finding the stationary (equilib- 
rium) concentrations of each of the chemical species. 

In the present work, we will describe an efficient 
method to find equilibrium concentrations for a class of 
networks which we will refer to as hetero-dimerization 
networks. In these networks the species associate to form 
dimers, which eventually break apart giving back the sin- 
gle species. The method is based on an iterative scheme, 
of which we can rigorously prove the convergence. The 
proof relies on the Banach fixed point theorem. 

The proposed method is very efficient for large reac- 
tion networks. As an example to show that convergence is 
fast, even for systems with ~ 10 6 species, we consider the 
hybridization of RNA strands. This example is of rele- 
vance, for instance, for a better quantitative understand- 
ing of the reactions underlying the functioning of DNA 
microarrays [TTl-flil]. It shows that some sequences tend 
to get effectively depleted from the solution because of 
partial complementarity with other strands. This brings 
some consequences for the design and interpretation of 
microarray experiments. 



This paper is organized as follows. In Section QI] we in- 
troduce the iterative algorithm and prove its convergence 
to the fixed point, irrespective of the initial condition. 
In Section |IIT] we show a concrete calculation for a net- 
work composed of a large number of hybridizing mRNA 
strands. Section HVl discusses the convergence rate of the 
algorithm and Section IVl concludes the paper. 



II. HETERO-DIMERIZATION REACTIONS 

A. Iterative scheme for stationary point 

We consider a set of different chemical species Ai (i = 
1,2... N) undergoing reversible association/dissociation 
reactions of the type: 

A{ + Aj — Aij . ( 1 ) 

where and Kij are the forward and reverse rates. 
The ratio of the rates must satisfy the detailed balance 
condition 

Kij/Kv = e^l RT (2) 

where AGij is the free energy of formation of the complex 
A^ (the free energy difference between the bound and 
unbound state). 

The system is considered to be well-mixed, i.e. diffu- 
sion of all species is assumed to occur on a fast time 
scale compared to reaction time scales. Furthermore, 
production and degradation are assumed to be absent. 
A configuration can then be characterised solely by the 
concentrations of all species Ci and of all dimers cy at a 
given time. In addition, the following conservation laws 
hold for every species Ai: 

Ci — Ci -\- ^ ^ Cij (3) 

3 

with Ci the constant total concentration of a species. 
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We consider mass-action kinetics, so the concentra- 
tions evolve in time according to 



dcj 
~dt 



Kj j Cjj -Kj; 



(4) 



and we are interested in the stationary solution dci/dt = 
0. A general theorem on Mass Action Reaction Net- 
works, the so-called Deficiency Zero Theorem [l|, guar- 
antees that the system (|TJ) has a unique stationary point, 
irrespective of the values of Kij and Kij . We omit the 
proof of this statement, it amounts to calculating the de- 
ficiency of the reaction network, which is an integer easily 
obtainable from the network topology (for more details 
see [H). Given a set of rates and Kij and some ini- 
tial concentrations, it is always possible to compute the 
relaxation to equilibrium by solving the ODE in Eq. Q 
numerically, for instance through discretization in small 
time steps At. However, this is a very costly procedure 
and thus very unpractical for large networks. In addition 
the discretization brings some errors scaling as powers of 
At, which accumulate during the calculation. We show 
here that the problem of finding the stationary point can 
be reformulated as an iterative problem, which is much 
more efficient and does not involve discretization approx- 
imations. 

The detailed balance condition (Eq. ©) provides some 
freedom in choosing the forward and reverse rates. Only 
their ratio needs to be fixed to guarantee convergence to 
the stationary point. One particularly interesting choice 
is Kij = 1, and thus K t j = e AG ij/ RT , Substituting 
these values in Eq.(U]) while setting dci/dt — and using 
Eq. (0 one finds: 



Ci 



which we rewrite as 



Y^e AG ^/ RT c 







1 + E, e AG v/ RT Cj 



Tj(ci, . . . Cat). 



(5) 



(6) 



where the right hand side defines a function T from the 
N-dimensional space of concentrations c = (ci,...cjv) 
into itself. 

Equations (j6|) are a set of N non-linear equations which 
must be solved to find the Cf. Using vector notation we 
write Eq. ([6]) as c = T(c). A possible way to solve this set 
is to use an iterative approach. Starting from an initial 
guess c , one can repeatedly apply the map T to obtain 



c^> = T(c(°>), . . . , c( fc+1 > = f (c<»), but it remains to 
be proven that the process converges to the fixed point. 
Indeed, the convergence in time to a unique stable fixed 
point of the kinetic equations (U]) (according to the Zero 
Deficiency Theorem) does not a priori imply the conver- 
gence of the iterated map. However, this convergence is 
guaranteed by a fixed point theorem for iterated maps, 
which we discuss next. 



B. Contraction maps 

The Banach fixed-point theorem guarantees the 
existence and uniqueness of fixed points for a class of 
maps c i y T(c) of a metric space into itself. A map 
is said to be a contraction map if any pair of arbitrary 
points is mapped to a pair of points that are closer to 
each other, i.e. if for any two given points c and c' in 
some subset fi of a metric space for which T : f2 — » O, 
one has: 



d{f{c),f{c'))<qd{c,c') 



(7) 



with d the metric (distance function) on the metric space, 
and with a so-called Lipschitz constant q < 1 . 

In order to prove that a map is a contraction one has 
to construct a suitable distance. Indeed, a map can be a 
contraction according to one distance, but not according 
to another one. We illustrate this from a simple one- 
dimensional example. Let us thus consider the map of 
the interval [0, c] into itself defined by 



no 



l + Kc 



(8) 



where c > and K > 0. This map has a unique fixed 
point since the quadratic equation c(l + Kc) = c has 
a single positive solution. In general this map is not a 
contraction for the usual Euclidean distance d E (a,b) = 
\a — b\. Take for instance c = 2, K = 1 and c = 0.1, d — 
0.2. One has d E (c,d) = 0.1, whereas d E (T(c),T((d)) = 
0.1515 .. ., which shows that two points can be mapped 
further apart from each other. 

We note that for any c, d > one has 



d E (T(T(c)),T(T(d))) = 
Kc\c-d\ 



Kc 



1 



(l + K(c + c))(l+K(d + c)) 



l+Kc ^ 1 l+Kc' 
<q\c-d\ 



(9) 



where 



Kc 



q = 



(1 + 2Kcf 



< 1 



(10) 



for any values of K and c. ([9]) and (ITOl) together imply 
that c H> /(c) = T(T(c)) is a contraction map. Existence 
of a unique fixed point for contraction maps is guaranteed 
by the 

Banach fixed point theorem - Let (X, d) be a non- 
empty complete metric space. Furthermore, let f : X — > 
X be a contraction mapping on X, i.e. there exists 
some q < 1 (called the Lipschitz constant) such that 
d(f (x) , f (y)) < qd(x,y) for all x,y € X. Then the 
map f has only one fixed point in X , and moreover, for 
any starting point xq £ X , the sequence {x n } defined by 
x n = f{x n -i) converges towards this fixed point. 

The problem with the Euclidean distance dE{x,x') = 

^2iLi(xi — x'j} 2 persists in higher dimensions, as it 
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is possible to find c, c' for which ds(T(c),T(c')) > 
cIe(c,c'). Therefore, we chose a different strategy. To 
use the Banach fixed point theorem in higher dimension, 
we constructed an appropriate metric for which we can 
explicitly prove that the map m T(c) is a contraction. 
In the one dimensional case the metric is: 



d(c, d) 



c + c' 



(11) 



We prove later that this indeed defines a metric for its 
high-dimensional generalization, first we will consider the 
one-dimensional case. We note that the presence of the 
denominator in (JTTJ) may produce a singularity when 
c, d — > 0. To avoid this problem we restrict ourselves 
to the interval X = [T(c),c]. It can be shown easily that 
T maps X into itself and that < Tic) < c so that the 
metric (fTTj) is always well-defined. X is also a compact 
space. Roughly speaking a space is compact if there are 
no points "missing" from it, either inside it or at the 
boundary. A close interval [a, b] is compact. Also a rect- 
angle or a cube in two and three dimensions are compact, 
provided all points in the borders are included. A hyper- 
cube, the iV-dimensional analog of a cube, is also com- 
pact. Recall that given N pairs of numbers (a*, 6j) (with 
a, < bi) a hypercube X contains all points (x±, X2, ■ ■ ■ xn) 
such that a.: < Xa < We have 



d(T(c),T(c')) 



K\c-c'\ 
2 + K{c + d) 



K(c + d) 
2 + K(c + d) 



^d(c,d) <qd(c,d) (12) 



where 



Kc 
1 + Kc 



< 1 



(13) 



for any value of K and c. This shows that according to 
the metric (|lll) the map c i— > T(c) is a contraction. In 
the following we will introduce a metric which is a higher 
dimensional generalization of (|11[) . 



C. Higher dimensional map 



Consider any N x N matrix with non-negative entries 
(Kij > 0). We prove that the map defined in Eq. (j6|) is 
a contraction in the space X of A^-dimensional vectors 
c = (ci, C2 . . . cat) such that all elements are £j < Ct < cj, 
where ci are fixed. Here we have defined 



1 + Tjj Kij^ 



> 



(14) 



It is easy to show that f maps X into itself. X is also 
compact. 

We consider the metric defined as 



d(c,c') 



max — 

l<i<JV Ci 



(15) 



which is a higher dimensional generalization of (flTT) . To 
prove that T is a contraction we have to show that for 
any two points in X, say c and c ', one has 



(16) 



d(f(o),f(c'))<qd(c,c') 



with Lipschitz constant q < 1. We first show that (115)) 
has the mathematical properties of a distance and then 
that flU holds. 



1. Eq. f_?5|) defines a metric 

To show that d() as defined by Eq. (|T5j) is a metric on 
the metric space X, we need to prove that for every c, c' 
and c" in X 

a) d(c, c') > and d(c, c') = iff c = c' 

b) d(c,c')=d(c',c) 

c) Triangle inequality: d(c, c') < d(c, c") + d(c", c') 

Proof: a) and b) are trivial. The triangle inequality 
requires some more work. We need to prove that 



max ■ 



< max ■ 



c» + c 



max ■ 



(17) 



We shall prove that the inequality holds for every i, thus 
that for any non-negative a, b and c one has 



\a-b\ \a-c\ \c-b\ 



(18) 



First of all we note that the inequality (|18l) is satisfied 
when a = or b = or c = 0. It is also satisfied when 
two elements are equal a = b, a = c or b = c. We have 
to consider then these different cases: (1) < c < b < a, 
(2) < c < a < b, (3) < b < c < a, (4) < b < a < c, 
(5) < a < c < b and (6) < a < b < c. However, the 
inequality (|18l) is symmetric in the exchange of a with b. 
We have to prove it only for the cases for which a > b: (1) 
< c < b < a, (3) <b < c< a and (4) < b < a < c. 

(1) < c < b < a. 
The inequality (TT5)) becomes 
a — b 



< 



(19) 



a + b a + c c + b 
which, after some elementary algebra, can be rewritten 



(b - c)[2a{b + c) + (a + c)(a + b)} > 



(20) 



This inequality is verified in the case (1) since b > c > 
and a > 0. 

(3) < b < c < a 

The inequality (fT8|) becomes 

a — o a — c c ■ 



a + b 



b a 
< — 



c + b 



(21) 
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and after some simple algebra we get 

(c-b){a- c){a-b) > 
which is satisfied for a > c > b > 0. 

(4) < b < a < c 

The inequality (fl"8|) becomes 



a — b c — a c—b 

< 



(22) 



(23) 



a + b a + c c + b 
which can be rewritten as 

{c-a)[2b(a + c) + {a + b)(c + b)] > (24) 

which is again satisfied for c > a. 

This proves that the triangle inequality is satisfied. 
Hence d defines a metric on X. 



which is close to the desired result, but it does not suffice, 
because our aim is to prove that there exists a q that is 
strictly smaller than 1 for which ()16[) is satisfied. 

To do that we proceed as follows. Let us define 
fcmax = maxjj Kij and c max = max^ cT, . We rearrange 
the denominator of the last term of (|2"5|) by adding and 
subtracting the same term as follows 



2 + E^fe+ c 7) 

3 



j 



cj + dj 



T 

3 

>(l 



Nk 



max '-•max 



)(c j + c'j) 



Nk 



maxwiiax 



Y,K i3 {cj+c'j) 



2c n 



(30) 



2. The inequality \16\) is verified 
Combining Eqs.© and (|T5|) we find 



d(f(c),f(c')) = max 



-E 3 -*« c 



E K .'-; 



1+ E 3 



A", 



A", 



T,jKij(cj - dj) 



< max 



2 + Ei 



'2 + E^ 

To proceed further we make use of the inequality 
J2i a l 



El h 



< max — 

i V h 



(25) 



(26) 



valid for a;, bi > 0. We prove this inequality in the case 
— < max — , — (27) 



from which (|26|) follows easily by repeatedly applying 
(|2T|). To verify flUJ) consider ai/6i > a 2 /& 2 - From this 
wehaveai&2 > 0,2^1 andai&2 + ai&i > a 2 bx + axbx, which 
implies ai/6i > (ax + 0,2) /{bx + b 2 ) and proves ([271) . 
Note that from ([26| it is immediately clear that 



max 



Ej -^»3 I c 3 



2 + E,-^y( 



< max 



max max ■ 



I c j — c 'j I 



E 7 ^( Cj + c ' 



d(c, c') 



< 3 2£y(e,-+c$) 
which proves that 

d(f(c),f(c'))<d(c,c') 



(28) 



(29) 



where we have defined 



Nk 



max ^m ax 



< 1 



(31) 



In deriving (|3T)]) we have used K^ < k n 
2c max which guarantees that 



and Cj + dj < 



Kij Cj 



> 



(32) 



Finally, combining (f30|) and (]25j). and followed by the 
inequality (|26|) . we obtain 

d(f{c),f(c')) < q max ^ ^ ' Cj ~ Cj ' 



< q max max 



K,. 



3 Kij^j + dA 



EjKijicj + dj] 
< qd(c,c'), 



(33) 



concluding our proof to confirm that (| 16[) indeed holds 
for all c, d in X. A Lipschitz constant q, which is required 
by the Banach fixed point theorem, is then given by (|3 1 [) . 



III. 



HYBRIDIZATION REACTIONS IN HUMAN 
TRANSCRIPTOME 



Having proven the convergence of the iterative algo- 
rithm for generic hetero-dimerization networks, irrespec- 
tive of the values of the rates, we proceed with a spe- 
cific example from biology. In this example the chemi- 
cal species are messenger RNA (mRNA) fragments taken 
from the human genome databank (details below). 

To clarify the importance of this example we recall 
briefly some facts. In order to understand the function 
of the genes in an organism, it is essential to know under 
which conditions (or in which cell types in a multicellu- 
lar organism) they are expressed, i.e. transcribed into 
single stranded mRNA. High throughput devices such as 



DNA microarrays [16[ have been extensively used for this 
type of analysis because they provide information on the 
whole transcriptome, the set of all RNAs produced by 
the cells by transcription, on a single experiment. On 
a microarray the complement of one or more fragments 
of a specific transcript, referred to as probes, are used 
as reporters. The probe sequences are covalently linked 
on a solid surface in spots. A solution containing the 
mRNA extracted from cells is deposited on the microar- 
ray surface. A transcript in solution with a sequence 
complementary to that of the probe tends to bind to it, 
a process known as hybridization, which is illustrated in 

Fig. mb). 

Typically, a large number of genes is transcribed simul- 
taneously in cells, hence mRNA extracted from biological 
samples contains many different sequences that cover a 
broad range of concentrations, reflecting the broad differ- 
ences in expression levels. Single stranded nucleic acids 
in solution tend to bind to sequences which are partially 
complementary to them, resulting in a double helical 
fragment, as illustrated in Fig. [Ha). The hybridization 
between partially complementary fragments in solution 
"competes" with the hybridization to the sequences at 
the microarray surface. 

Consider a single stranded mRNA fragment t tran- 
scribed from a given gene. If the solution contains an- 
other transcript t' which has a strong tendency to hy- 
bridize to the first fragment, both or only one of the 
two sequences may get significantly "depleted" from the 
solution. If the binding is strong enough, duplex forma- 
tion may continue until almost all fragments of the least 
abundant type are hybridized. 

As pointed out by several papers EMI the presence 
of hybridization in solution may lead to an underestima- 
tion of expression levels from microarray data analysis. 
It is therefore important to be able to quantify its ef- 
fect. This is the aim of this example discussed here. The 
equilibrium and kinetics of mutual hybridization between 
DNAs was studied before [l7|, but only for systems with 
about N ~ 10 2 sequences. 



A. The sample 

For the computation we considered a database contain- 
ing 33,457 human mRNA's sequences downloaded from 
ftp . ncbi . nih . gov/ ref seq/H_sapiens/mRNA_Prot/, 
file human. rna.fna. These transcripts have an average 
length of several thousand nucleotides. However, in 
typical biochemical assays, as for instance in microarray 
experiments the transcripts are present in shorter 
fragments of various lengths. We have chosen to divide 
the sequences into fragments of 48 nucleotides, starting 
from the 5' end of the transcript. The first fragment 
starts thus from nucleotide n\ = 1 of the given tran- 
script. The m-th fragment starts at nucleotide position 
n rn — n m —i + 8, i.e. with a shift of 8 nucleotides from 
the previous one. This procedure avoids artifacts due 
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FIG. 1. (a) Hybridization reaction in solution between par- 
tially complementary mRNA strands, (b) Hybridization reac- 
tion between mRNA strand from solution and substrate-based 
microarray DNA probe. 

to the exact location of the fragmentation point. For a 
transcript of length L the fragmentation produces thus 
L/8 fragments, rounded down. 

For the transcripts analyzed, the fragmentation pro- 
duces in total N = 3, 150, 659 different 48-mers. To each 
of these an initial concentration Cj is assigned, such that 
fragments originating from the same transcript are given 
the same concentration. Input concentrations were ob- 
tained from DNA microarray data of Human mRNA in 
different tissues, using the outputs from the data analy- 
sis algorithm discussed in Ref. [ijj]. Typical concentra- 
tions range from a few picomolars (pM) for low expressed 
genes, to nanomolar (nM) for the highly expressed genes. 

The hybridization free energies AGy , used in Eqs. (Hp 
were computed using the nearest- neighbor model [19j . 
which assumes that the stability of the double helix de- 
pends on the identity and orientation of neighboring base 
pairs. The total free energy of a hybridizing strand is 
obtained as the sum of 10 independent parameters ac- 
counting for hydrogen bonding and stacking interactions. 
In our computations we used the RNA/RNA parameters 
given in Hi, at 1M [Na+] and a temperature T = 55°C. 

B. Efficient construction of interaction matrix 

The iterative scheme presented above has a computa- 
tional cost of order N 2 : for each of the N equations of 
(|BJ) one has to compute the sum of N terms. However, 
the analysis of the terms entering in the sum in the de- 
nominator of Eq. (|6]) shows that this sum is dominated 
by a few terms corresponding to the highest values of the 
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FIG. 2. Plot of Kij for 10 randomly selected fragments. The 
data are shown in decreasing order. It is seen in this figure 
that for each fragment, there are typically a few other frag- 
ments in the solution with which there is a significant inter- 
action. Lists were generated containing all fragments which 
were complementary for at least 8 consecutive nucleotides. 
The number of 'partners' within a given free energy range 
is growing roughly exponentially, similar to what would be 
expected for randomly generated sequences. However, the 
Boltzmann factors Kij = e' 3 decrease faster, such that 
the highest few factors still dominate the sum ^ CjKij in 
the iterative scheme. Consequently, we can approximate this 
sum by truncating the list. 



hybridization free energy. Figure [2] shows a plot of the 
Kij for some randomly selected species i, as a function 
of j ranked in decreasing order. The Boltzmann factors 
Kij — e^ AGij decay rapidly as a function of j. As an 
approximation we kept only the first ten dominant terms 
for each K^, estimating that the typical error on the 
results is of a few percent. This improves the memory 
requirements of any implementation, and hence allows a 
much greater number of sequences to be present in our 
calculations. 

To build up the matrix elements efficiently we generate 
a list of all possible sequences of length I = 8 (this list has 
size A 1 and we refer to it as the primary list). We then 
run through all the mRNA sequences and generate an in- 
dex vector which maps each position on a corresponding 
element of the primary list. This is an operation of order 
N . Having found two sequences i and j that are com- 
plementary for a stretch of length I = 8, we can check if 
this complementarity can be extended to a longer stretch. 
This method is still of computational complexity of order 
./V 2 , however, with a small pref actor compared to a full 
complementarity matching. With the used method one 
ignores complementarity for stretches shorter than / = 8 
nucleotides, but these sequences would not be expected 
to cause significant hybridization anyways. The imple- 
mentation can become very efficient by using binary op- 
erations: the four nucleotide types are encoded into two 
bits, complementarity can then be easily checked by a 



FIG. 3. Initial concentrations (Ej, solid line) and equilibrium 
single stranded concentration (c*, circles) for seven selected 
transcripts. Fragments with a high total concentration are 
typically unaffected by hybridization in solution, but some of 
the fragments with a low total concentration get significantly 
depleted. Inset: zoom of the dashed zone around transcript 
7. The arrow shows a region where significant depletion has 
taken place. 



bitwise XOR operation. 



C. Results 

Once the initial concentrations Cj of the N fragments 
are fixed, we repeatedly apply the map defined in Eq. ([6]). 
As proven before, the iterative procedure converges to a 
unique fixed point, representing the equilibrium concen- 
tration of fragments which are not hybridized. 

In practice, the convergence criterion has been cho- 
sen such that the distance in concentration (using the 
distance d(c, (?) = J2i l c » — c i\/\ c i + c 'i\ f° r convenience) 
between two successive iterations is smaller than a given 
small value: d(cW, c^ k+1 ') < e with e = 1 Typically 
about 10 2 iterations are sufficient to guarantee this level 
of accuracy. 

Figure [3] shows a typical output of the computation for 
7 randomly chosen transcripts of varying lengths. The 
fragments are ordered as they are generated from the 
fragmentation procedure described above, thus the hori- 
zontal scale should be multiplied by a factor 8 to have the 
length in nucleotides. The thin solid line corresponds to 
the total concentration Ci which, as mentioned before, is 
chosen to be constant for each transcript. For the tran- 
scripts shown in Fig. [3] the initial concentrations range 
from 2.8 pM (for transcript 3) to 670 pM (for transcript 
2). The points denote the equilibrium concentration of 
single strands c* , computed from the iterative algorithm. 
The figure shows different types of behaviors for the tran- 
scripts. Transcript 2, which had the highest initial con- 
centration, is weakly affected by hybridization with other 



fragments and most of the fragments have a free concen- 
tration very close to the initial one Cj « c,. Transcript 7 is 
moderately affected by hybridization with other targets. 
A few of its fragments strongly hybridized with comple- 
mentary partners in solution such that the concentration 
of free strands can drop of several orders of magnitude. 
The other transcripts, whose initial concentration was of 
few picomolars, are strongly affected and for almost all 
fragments, the free concentration is much lower than the 
initial concentration, c, <C c.; . 

As mRNA has typically a length of several thousands 
of nucleotides, there are many possible ways in which 
probes can be selected from it (most microarrays use as 
probes of 20-50 nucleotides) . Probe design is a fundamen- 
tal step in the realization of a DNA microarray. As an 
example, the results of Fig. |3](inset) suggest that a probe 
selected in the region marked by the arrow on transcript 7 
can significantly underestimate the true expression level 
of the transcript. As the transcript fragment strongly 
hybridizes in solution (reaction (a) of Fig. [TJ a small con- 
centration of single strand is available for hybridizing to 
the microarray surface (reaction (b) of Fig. [T]). Based on 
these results, an additional important criterion for probe 
design should be the avoidance of strongly hybridizing 
transcriptomic regions. 

Computations have been performed on different sam- 
ples, differing by the values of the c-s, reflecting the dif- 
ferent expression levels expected in different human tis- 
sues. The details of the computations will be presented 
elsewhere (F. Berger, M. G. A. van Dorp and E. Carlon, 
unpublished). As in the example above, we find in the 
transcriptome some regions strongly affected by mutual 
hybridization. 




iteration 



FIG. 4. Convergence of 32 species in the human transcrip- 
tome analysis. The horizontal axis is for the number of it- 
erations, while the vertical axis shows the concentration for 
each of the 32 probes at each iteration. Concentrations ap- 
pear to be fairly stable at 200 iterations, suggesting a fair 
degree of convergence. The four thicker lines correspond to 
four concentrations that display slow convergence. 

The Banach theorem provides an estimate of the con- 
vergence rate given by q, as defined in Eq. (f5T|) . However 
this is not a very practical bound for convergence, as 
even in cases where just a few species bind very strongly 
to each other, we easily have fc max c ma x 3> 1. More real- 
istic convergence rates can be obtained from the linear 
stability analysis around the fixed point. 



IV. CONVERGENCE OF THE ITERATIVE 
PROCEDURE 

In ths section we discuss the speed of convergence of 
the iterative algorithm for the specific example of mutual 
hybridizations in the human transcriptome and more in 
general for a generic hetero-dimerization network. 



A. Convergence for human transcriptome 
hybridization analysis 

Figure @] shows the concentrations c\ of 32 randomly 
selected species as a function of the iteration number k 
for the reaction network of hybridizing mRNA fragments 
discussed in Section IIIH One notices that the conver- 
gence to the stationary value is attained for the majority 
of species after about 50 iterations. However the speed of 
convergence varies for the different species, and in partic- 
ular for three species in Fig. 2] convergence is much slower 
than average (thick lines). In all cases shown, after about 

(k) 

k « 200 iterations c\ has become mostly stationary. 



B. Linear stability analysis predictions for 
convergence 

Given N dimerizing species, we construct the matrix 

J C^T m (c ) KmnCm ^mnC m 

mn = = (1 + E^i^c*) 2 = 1 + XV K m] c* 

(34) 

where T(c) is the iterated map defined by Eq. ([5]) and 
c* denotes the fixed point. Note that J is obtained from 
the Jacobian matrix, by swapping the signs of all en- 
tries. The matrix J is non-negative in the sense that for 
all its elements J mn > 0. In the following we will use 
the notation v < w if the inequality holds for all ele- 
ments of the two vectors v and w. For a non-negative 
matrix J, a common extension to the Perron- Frobenius 
theorem [21| guarantees that there exists a (not neces- 
sarily unique) largest eigenvalue r > whose eigenvector 
<j> is non-negative, 4> > 0. This eigenvector is known as 
the Perron- Frobenius vector. The largest eigenvalue r 
determines the slowest convergence rate of the iterative 
scheme. 

We have proven that the map T is a contraction, hence 
necessarily r < 1. We can however derive a stronger 
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bound as follows. From (|34j) one derives 



\ / m 



i 



K. 



(35) 

Consider now J T , the the transpose of J. The trans- 
pose has the same eigenvalues as J, but a different eigen- 
vector. The Perron-Frobenius theorem applies also to J T 
for which J T <f>' = r<fi' and <f>' > 0. One has 



= J 1 



= $' ■ Jc* 



(36) 



where the dot indicates the scalar product. 

Working out this scalar product and using Eq. (|35p one 
finds 



< max{a n } > c 

n — ' 



where we have used the fact that 
Equation (j3"?| shows that 



r < max ■ 



1 



> and 



< 1. 



(37) 



> 



(38) 



For most networks, especially those where most values 
K mn are small and only a few ones are very large, this is 
a better bound to the convergence rate compared to that 
guaranteed by Banach's theorem (Eq. (|3T|)) as: 



max 



KmjCj 



1 



1 



< q (39) 



where q is the Lipschitz constant given by Eq. pip . 

The largest eigenvalue of the Jacobian matrix provides 
the slowest rate of convergence. The example of Fig. 0] 
shows that the convergence rate may differ for different 
species. We provide here some simple insights on possible 
origins of the slow convergence. 

Consider first reaction networks for which the dimer- 
ization process is very weak, so that the equilibrium con- 
centrations c* n differ only weakly from the total concen- 
tration c m . In this case K mn c* n <C 1, hence Eq. (|38|) 
implies fast convergence. 

More interesting is the case of strongly interacting 
networks, where slow relaxation to equilibrium is ex- 
pected. To illustrate this, consider two strongly inter- 
acting species for which dimerization is so strong that 
interaction with the rest of the network can be neglected 
in first approximation. This two species system is de- 
scribed by the equations 



ci = 



Cl 



C2 



f Kc 2 

C2 



l + K Cl 



(40) 
(41) 



where for simplicity we discard the possibility of self- 
dimerization and K = K\ 2 = K 2 \. The diagonal ele- 
ments of the Jacobian vanish (Jn = J22 = 0), therefore 



the rate of convergence (eigenvalues of J) is given by 



A = ±\/ J12J21 



(42) 



We consider now two different cases: (1) c\ = c 2 = c 
and Kc > 1 and (2) ci > c 2 and Kc 2 > 1. 

The case (1) corresponds to the two species having 
the same initial concentration and strongly dimerizing to 
each other. In equilibrium c\ = c* 2 -C c. Some simple al- 
gebra shows that the eigenvalues of the Jacobian become 



A = ±1 + C 



1 



(43) 



which implies a very slow convergence, as |A| is close to 
1. In the case (2) one finds 



A 




(44) 



which is small when ci ^ C2- This implies very fast 
convergence. 

If this type of problems would be encountered in large 
networks, we suggest to solve them by computing analyt- 
ically the equilibrium values for the subnetwork contain- 
ing only these two species. Putting the resulting equilib- 
rium concentrations as initial concentrations in the full 
network, the iterative scheme should converge fast. Fi- 
nally, we remark that problematic cases can easily be 
detected by checking whether ci « c 2 ■ If this is not the 
case, convergence will necessarily be fast. 



V. CONCLUSIONS 

We have described a novel algorithm to efficiently com- 
pute the equilibrium concentrations for a class of chemi- 
cal reaction networks. The core part of our algorithm is 
an iterative procedure, which we have shown to converge 
to the unique stable point of the system. Furthermore, 
we have analysed the convergence properties to conclude 
that convergence is typically fast, except in certain prob- 
lematic cases. These problems turn out to be easy to de- 
tect, and the solution to these problems by analytically 
solving subnetworks could eventually be implemented. 

The inspiration for considering this problem can be 
traced back to the analysis of physical models describ- 
ing experiments with DNA microarrays. Consequently, 
we have implemented our algorithm specifically to test 
whether it can be used to find the equilibrium concentra- 
tions for a system of this size, where of the order of 10 6 
different RNA fragments form a large reaction network. 
We found that convergence was quick, on the order of at 
most a few hundred iterations, and that the full compu- 
tation took only a few minutes on a mainstream desktop 
PC. 

In the present work we have restricted our analysis to 
hetero-dimerization networks with mass action kinetics. 
It would be very interesting to expand these iterative 
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algorithms to a wider class of chemical reaction networks, 
particularly for the case of genetic regulatory networks 
[2j. One factor limiting the study of these networks is 
that often the values of reaction rates are not known 
apart for very few well studied cases [2|. The steady 
state analysis has to be repeated for various input rates, 
therefore it is very important to have fast algorithms, 
which could perform this analysis efficiently. 
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